Design and Implementation of a High-Performance Distributed Web Crawler

نویسندگان

  • Vladislav Shkapenyuk
  • Torsten Suel
چکیده

Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and OS limits must be taken into account in order to achieve high performance at

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed High-Performance Web Crawler Based on Peer-to-Peer Network

Distributing the crawling activity among multiple machines can distribute processing to reduce the analysis of web page. This paper presents the design of a distributed web crawler based on Peer-to-Peer network. The distributed crawler harnesses the excess bandwidth and computing resources of nodes in system to crawl the web. Each crawler is deployed in a computing node of P2P to analyze web pa...

متن کامل

UbiCrawler: a scalable fully distributed Web crawler

We present the design and implementation of UbiCrawler, a scalable distributed web crawler, and we analyze its performance. The main features of UbiCrawler are platform independence, fault tolerance, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task.

متن کامل

A Scalable, Distributed Web-Crawler*

In this paper we present a design and implementation of a scalable, distributed web-crawler. The motivation for design of such a system to effectively distribute crawling tasks to different machined in a peer-peer distributed network. Such architecture will lead to scalability and help tame the exponential growth or crawl space in the World Wide Web. With experiments on the implementation of th...

متن کامل

Design and Implementation of an Efficient Distributed Web Crawler with Scalable Architecture

Distributed Web crawlers have recently received more and more attention from researchers. Centralized solutions are known to have problems like link congestion, being a single point of failure ,while the fully distributed crawlers become an interesting architectural paradigm for its scalability, increased autonomy of nodes. This paper provides a distributed crawler system which consists of mult...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002